A Position Paper: Value of Information for Evidence Detection
نویسندگان
چکیده
In real-world applications, evidence detection involves evaluating a body of existing information from time-evolving multi-modal data sources. It seems obvious that approaches to evidence detection should consider the relative quality of data sources with respect to the value of information being produced over time. For instance, considering all data sources equally reliable can yield undesirable results. We highlight the distinction between the traditional value of information problem where information is pulled from sources and the value of information for evidence detection problem where information is pushed from the sources. We further comment on how this distinction enables new qualities of information to be measured and characterized. In this paper, we address the following questions: What should value of information mean for evidence detection? What are the components needed to characterize value of information? How should these components be measured and combined to compute a value for information? Finally, how should value of information be used in evidence detection? We develop a framework for implementing value of information for evidence detection and present the results of a preliminary feasibility study. Introduction and Background The sheer size and complexity of data sets in real-world applications have prompted many efforts on management and analysis of large complex networks. These networks store multi-source data with multi-modal and multi-relational properties in a human-understandable way. One of the associated challenges is to find evidence in support of or against a particular hypothesis. It is common to try to detect patterns in the data and use these patterns to perform inference or update beliefs in a set of hypotheses. The published literature is full of algorithms for pattern matching, pattern mining, link analysis, and many others on such networks (Gallagher 2006; Getoor & Diehl 2005; Washio &Motoda 2003). A logical next step is to take these algorithms further and use their patterns/results as evidence. In this paper, we argue that any approach to evidence detection should consider the value of information (VOI) for the output of the data sources. Without such measurement, all ∗Work performed at Lawrence Livermore National Laboratory. Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. sources and their data are considered “equal.” This assumption is clearly false. For instance, using patterns that were generated from an unreliable data source is clearly not desirable. Our proposed VOI framework enables evaluation of detected patterns based on their qualities. Throughout the remainder of this paper, we highlight related work from a number of fields. We provide an idealized definition of VOI as well as a technique for estimating it. We conclude by presenting some of the results obtained in our preliminary feasibility study and discussing the future directions for continuing this research. The problem of determining VOI is well-studied in various fields dating back to the 1940s. All existing approaches solve a variant of the following problem. Given a set of sources, which is the best (or best set) to obtain an observation (or a set of observations) from? In other words, an agent must determine the optimal “activation schedule” for the sources of information to maximize (or minimize) some objective. There are a number of approaches to solving this problem, based mostly on decision theory and/or information theory. However, measuring VOI for evidence detection differs from previous work in several ways. First, prior work typically makes inherent assumptions about reliability of information (Horvitz & Rutledge 1991; McCarthy 1956; Shannon 1948). In particular, the traditional approaches of information and decision theory assume sources to be fully reliable. Data for evidence detection is often not fully reliable or even relevant. It originates from multi-modal sources – each with varying characteristics that can change as the world evolves. Second, previouswork typically characterizes the value of querying an information source which is a pull problem (as data is pulled from sources) (Heckerman, Horvitz, & Middleton 1993; Zhang, Ji, & Looney 2002). For the purpose of evidence detection, we are interested in understanding how to interpret data that we have already obtained or has been pushed to us. To understand this distinction, consider the following situation. You are buying a new car from a company that is known to produce a high quality product. Unfortunately, the company has redesigned the car for this year and you do not know if it is up to the usual standards. There are two ways you can proceed. One is to proceed under the assumption that the company’s reputation is sufficient and they will likely not produce a bad car. Alternatively, you can test drive the car and determine if the product is consistent with what you know about the company. The first case, when you make the decision solely based on reputation, is an example of a pull problem. The second case, when you decide after test driving the vehicle, is an example of a push problem. The key distinction is whether the value of what the source (or company) produces is determined prior to (pull) or after (push) inspection of the product (or information). Our position is that in detecting evidence within a body of existing information, push methods are what need to be utilized. To use the methods developed for the pull problem, where inspection of the current information does not occur when evaluating quality or value, imposes an artificial handicap on the detection of evidence. The pull approach is warranted when there is a cost for obtaining information as is typically assumed in decision theory. Since evidence is available without cost, we can exploit the opportunity to inspect the current information and develop a measure for VOI more attune to shifts in information quality. Third, we found little work that attempted to learn VOI and/or its components across multiple sources over time. Generally speaking, the majority of work in this area comes from the Information Fusion community (Rogova & Nimier 2004). The goal in information fusion is to combine multiple sources of information into one coherent representation. Often, the pre-fusion information is missing values, pertains to disjoint concepts, or may be unreliable. All of these as well as other properties of the pre-fusion information must be taken into account when designing a fusion operator. Moreover existing approaches in the fusion community, such as Delmotte, Dubois, and Borne’s (1996), generally do not involve learning to characterize the quality of data. In this work, it is noted that the quality of knowledge produced by fusion is influenced by adequacy of the data, quality of the uncertainty model, and quality of the prior knowledge. Much of this work has focused on the improvement of an uncertainty model and has completely ignored the reliability of information. When it is considered, two measures of reliability are discussed: 1) The relative stability of the first order uncertainty; 2) The accuracy of the beliefs. It is assumed that the fusion operator will not introduce any residual uncertainty that is not due to the data itself. A fair amount of research has been devoted to the incorporation of reliability into fusion rules. In this research, the reliability measure comes in one of three forms: 1) It is encoded by external sources (e.g. context or an expert); 2) It is learned using training data; 3) It is constructed based on agreement of sources or consensus (Delmotte, Dubois, & Borne 1996; Parra-Loera, Thompson, & Salvi 1991). It is not, however, estimated prior to fusion. Historically, consensus models of reliability have taken one of the following two forms: 1) A degree of deviation between measurements of each source and the fusion result (e.g. posterior belief); 2) A measure of “inner trust” based on a pairwise degree of “likeliness” of agreement (or consensus) between sources. While this work is interesting and in some cases can improve inference performance, there is a notable problem with consensus measures of reliability. Specifically, lack of consensus is sufficient for low reliability but not necessary. Further, for VOI to Figure 1: VOI Framework for Evidence Detection be useful for evidence detection, it must be computed prior to fusion–and not as a function of its output. Measuring VOI for evidence enables a solution to the problem of detecting information (i.e. evidence) with the highest qualitative value to either confirm a true hypothesis or disconfirm a false hypothesis. VOI for evidence detection should: 1) capture information from possibly unreliable sources; 2) characterize the value of a body of existing information (a push problem); and 3) be tailored to improve inference and allow data triage. Figure 1 depicts our framework for VOI in evidence detection. At the top of the Figure there are a set of sources that are pushing reports (or information). The different icons in the sources are intended to represent the multi-modal data. The reports they generate evolve over time as the environment evolves. In the center of the Figure, there is an f inside a circle. To the left is a representation of a set of beliefs (i.e. prior knowledge or knowledge obtained earlier) that are evolving over time and to the right are the set of hypotheses that are (possibly) evolving over time. The f represents the combination of information, beliefs, and hypothesis into a quality metric. Note that this is not the same as fusion. In the fusion framework, the information from the sources would be combined to produce “manipulated” data (we return to this concept later). In our VOI framework, the information from the sources is combined to produce a measure of quality. This measure of quality can be used to detect evidence, inform inference, or even inform fusion. This quality (or VOI) measure is represented by the thumbs-up-thumbsdown icon at the bottom of the Figure and generally lies somewhere in between the extremes of useless and useful (i.e. it is not a binary measure of goodness). We seek a learning algorithm (such as regression) to estiFor a discussion on the degree to which a hypothesis is confirmed or disconfirmed, see (Fitelson 2001b). Note that even with anonymized data, one needs to evaluate
منابع مشابه
Reform in Accounting Standards: Evidence from Saudi Arabia
Middle East countries have begun to implement economic reforms to stimulate private investment, promote economic growth and support the transition to market economy. Although, it is difficult to define the direct impact of the accounting system reform on economic transformation, as there are many other conditions that have influence on the transition process. However, with the central position ...
متن کاملتعدد اقرار در جرایم علیه عفت عمومی
Detection of the truth has been ever one of the important concerns of scientists in the sciences related to judicial issues. In penal problems, to detect the truth, the evidences are resorted which are referred to as proof evidences. These evidences or in other word their queen is confession. The confession as the queen of evidences has special position in Islamic judicial context. In penal l...
متن کاملFaults and fractures detection in 2D seismic data based on principal component analysis
Various approached have been introduced to extract as much as information form seismic image for any specific reservoir or geological study. Modeling of faults and fractures are among the most attracted objects for interpretation in geological study on seismic images that several strategies have been presented for this specific purpose. In this study, we have presented a modified approach of ap...
متن کاملStudy of information content Equity Market Value in predicting Shareholder Value Added and Created Shareholder Value Evidence from Tehran Stock Exchange
The aim of this paper is to investigate the relationship between Equity Market Value (EMV) and measures of creation value of the performance evaluation (Shareholder Value Added (SVA) and Created Shareholder Value (CSV)) in Tehran Stock Exchange. Thus this paper examined the creation value in Iranian Companies by Alfred Rappaport model and to assess the relationship, liner regression tests w...
متن کاملبررسی عملکرد پروبهای UHF در آشکارسازی تخلیه جزئی در ترانسفورماتورهای فشار قوی
Recently, UHF partial discharge (PD) detection on power transformers attracts lots of attentions. For the transformers already installed in power network, the UHF signals can be captured only by UHF probes installed through oil drain valve. Although UHF probes are commercially produced, there are a lot of missing information on characteristics and features of these probes for PD detection. In t...
متن کاملA Survey of Anomaly Detection Approaches in Internet of Things
Internet of Things is an ever-growing network of heterogeneous and constraint nodes which are connected to each other and the Internet. Security plays an important role in such networks. Experience has proved that encryption and authentication are not enough for the security of networks and an Intrusion Detection System is required to detect and to prevent attacks from malicious nodes. In this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006